Goto

Collaborating Authors

 dual model


Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Neural Information Processing Systems

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we attempt to explore the ICL process in Transformers through a lens of representation learning. Initially, leveraging kernel methods, we figure out a dual model for one softmax attention layer.



e52ad5c9f751f599492b4f087ed7ecfc-AuthorFeedback.pdf

Neural Information Processing Systems

Due to limited time, we evaluated SNM [Yin and Neubig, 2017] on Python dataset.5 SNM explicitly introduces the constraints of grammar rules when generating ASTs. The BLEU score for SNM is6 10.62 and similar to our Basic model, indicating that the CG task on this dataset is very challenging. In particular,7 all prediction of SNM is valid, whereas the percentage of valid code generated by the dual model is low (Table 1).8 Since CS and CG models are trained at the same time and the parameters of the36 two models are separate after the joint training, i.e., the two models solve their respective tasks separately after the37 joint training, the number of parameters of each dual model is the same as that of the basic model.



Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Neural Information Processing Systems

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL.


Code Generation as a Dual Task of Code Summarization

Neural Information Processing Systems

On the other hand, CG is an indispensable process in which programmers write code to implement specific intents [Balzer, 1985]. Proper comments and correct code can massively improve programmers' productivity and enhance software quality.



Towards Understanding How Transformers Learn In-context Through a Representation Learning Lens

Neural Information Processing Systems

Pre-trained large language models based on Transformers have demonstrated remarkable in-context learning (ICL) abilities. With just a few demonstration examples, the models can implement new tasks without any parameter updates. However, it is still an open question to understand the mechanism of ICL. In this paper, we attempt to explore the ICL process in Transformers through a lens of representation learning. Initially, leveraging kernel methods, we figure out a dual model for one softmax attention layer.


FedFixer: Mitigating Heterogeneous Label Noise in Federated Learning

arXiv.org Artificial Intelligence

Federated Learning (FL) heavily depends on label quality for its performance. However, the label distribution among individual clients is always both noisy and heterogeneous. The high loss incurred by client-specific samples in heterogeneous label noise poses challenges for distinguishing between client-specific and noisy label samples, impacting the effectiveness of existing label noise learning approaches. To tackle this issue, we propose FedFixer, where the personalized model is introduced to cooperate with the global model to effectively select clean client-specific samples. In the dual models, updating the personalized model solely at a local level can lead to overfitting on noisy data due to limited samples, consequently affecting both the local and global models' performance. To mitigate overfitting, we address this concern from two perspectives. Firstly, we employ a confidence regularizer to alleviate the impact of unconfident predictions caused by label noise. Secondly, a distance regularizer is implemented to constrain the disparity between the personalized and global models. We validate the effectiveness of FedFixer through extensive experiments on benchmark datasets. The results demonstrate that FedFixer can perform well in filtering noisy label samples on different clients, especially in highly heterogeneous label noise scenarios.


Federated Semi-Supervised Learning with Annotation Heterogeneity

arXiv.org Artificial Intelligence

Federated Semi-Supervised Learning (FSSL) aims to learn a global model from different clients in an environment with both labeled and unlabeled data. Most of the existing FSSL work generally assumes that both types of data are available on each client. In this paper, we study a more general problem setup of FSSL with annotation heterogeneity, where each client can hold an arbitrary percentage (0%-100%) of labeled data. To this end, we propose a novel FSSL framework called Heterogeneously Annotated Semi-Supervised LEarning (HASSLE). Specifically, it is a dual-model framework with two models trained separately on labeled and unlabeled data such that it can be simply applied to a client with an arbitrary labeling percentage. Furthermore, a mutual learning strategy called Supervised-Unsupervised Mutual Alignment (SUMA) is proposed for the dual models within HASSLE with global residual alignment and model proximity alignment. Subsequently, the dual models can implicitly learn from both types of data across different clients, although each dual model is only trained locally on a single type of data. Experiments verify that the dual models in HASSLE learned by SUMA can mutually learn from each other, thereby effectively utilizing the information of both types of data across different clients.